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Abstract 

In applications of distributed storage systems to distributed computing and implementation of key- 
value stores, the following property, usually referred to as consistency in computer science and engi¬ 
neering, is an important requirement: as the data stored changes, the latest version of the data must be 
accessible to a client that connects to the storage system. An information theoretic formulation called 
multi-version coding is introduced in the paper, in order to study storage costs of consistent distributed 
storage systems. Multi-version coding is characterized by v totally ordered versions of a message, and a 
storage system with n servers. At each server, values corresponding to an arbitrary subset of the v versions 
are received and encoded. For any subset of c servers in the storage system, the value corresponding to 
the latest common version, or a later version as per the total ordering, among the c servers is required to 
be decodable. An achievable multi-version code construction via linear coding and a converse result that 
shows that the construction is approximately tight, are provided. An implication of the converse is that 
there is an inevitable price, in terms of storage cost, to ensure consistency in distributed storage systems. 


I. Introduction 

There is an enormous interest in recent times to understand the role of erasure coding in 
distributed storage systems. In this paper, we formulate a new information theoretic problem, 
the multi-version coding problem, motivated by applications of distributed storage systems to 
distributed computing and implementation of key-value stores. The multi-version coding prob¬ 
lem captures two aspects that are not considered previously in information theoretic studies of 
distributed storage systems: 

i) In several applications, the message (data) changes, and the user wants to get the latest version 
of the message. In computer science literature [1], the notion of obtaining the latest version of 
the data is known as consistency 1 

ii) There is an inherent asynchrony in storage systems due to the distributed nature of the system. 
As a consequence, the new version of the message may not arrive at all servers in the system 
at the same time. 

The design of a consistent data storage service over an asynchronous distributed storage system has 
been studied carefully in distributed computing theory literature [1], [4], and forms an integral part 
of several data storage products used in practice, such as Amazon Dynamo [5], Apache Cassandra 
[6], and CouchDB [7]. The main objective of the multi-version coding problem is to understand the 
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storage costs of consistent distributed storage systems from an information theoretic perspective. 
We begin with an informal description of the problem. We discuss the background and motivation 
of our problem formulation in Section I-B. 

A. Informal Problem Description 

Our problem formulation is pictorially depicted in Fig. 1. Consider a distributed storage system 
with a set of n servers. Suppose that it stores message W\ using an n length code, such that a 
decoder can connect to any subset of c servers and decode W\. Suppose an updated version of the 
message W 2 enters the system. For reasons that may be related to network delays or failures, W 2 
arrives at some subset of servers, but not others. We assume that each server is unaware of which 
servers have W 2 and which do not. The question of interest here is to design a storage strategy for 
the servers so that, a decoder can connect to any c servers and decode the latest common version 
among the c servers, or some version later than the latest common version. That is, W 2 must 
be decodable from every set of c servers where each server in the set has received both W\ and 
W 2 . For every set of c servers where there is at least one server which has not received W 2 , we 
require that either W\ or W 2 is decodable. We intend our storage strategy to be applicable to every 
possible message arrival scenario, and every possible subset of servers of size c. A possible scenario 
is depicted in Fig. 1 for n = 3, c = 2. 

Notice that in the storage strategy, a server with both Wi, W 2 stores a function of W\ and W 2 , 
whereas a server with only W\ stores a function of W±. We now describe two simple approaches, 
replication and simple erasure coding , that solve this problem. We assume that the size of both 
versions are equal, that is, the number of bits used to represent W\ is equal to W 2 - We refer to the 
size of one version as one unit. 

• Replication: In this strategy, we assume that each server stores the latest version it receives, 
that is, servers with both versions store W 2 , and servers with the first version store W\. Notice 
that the storage cost of this strategy is 1 unit per server, or a total of n units. See Table I for 
an example. 

• Simple Erasure Coding: In this strategy, we use two (n, c) MDS (maximum distance separable) 
codes, one for each version separately. A server stores one codeword symbol corresponding to 
every version it receives. So, a server with both versions stores two codeword symbols resulting 
in a storage cost of - units, whereas, a server with only the first version stores - unit. Notice 
that for the worst case where all servers have both versions, the total storage cost per server 
is | unit. See Table II for an example. 

We use worst-case storage costs to measure the performance of our codes for simplicity. Therefore, 
the per server storage cost of replication is equal to 1 unit, and that of the simple erasure coding 
strategy is equal to | units. The singleton bound provides a natural information theoretic lower 
bound on the storage cost. In particular, the singleton bound implies that each node has to store 
at least ^ units, even for storing a single version. A natural question of interest is whether we can 
achieve a storage cost of ^ or whether a new information theoretic lower bound can be found. It is 
useful to note that asynchrony makes the problem non-trivial. In a synchronous setting, where all 
the servers receive all the versions at the same time, an MDS code-based strategy where each server 
stores a codeword symbol corresponding to the latest version received suffices. So, in a synchronous 
setting, the singleton bound would be tight. 

Our main achievability result provides a code construction, which shows that replication and 
simple MDS codes are both sub-optimal. It is worth noting that we do not make any assumptions 
on the correlation between the two versions. Even with our conservative modeling assumption which 
ignores possible correlation among the versions, we can construct achievable coding schemes that, 
albeit mildly, improve upon simple erasure coding and replication. 
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(a) 


(b) 


(c) 


Fig. 1. Storing a file with 2 version in n — 4 nodes. From any c = 2 nodes, the code should recover the latest common 
version or something later. We denote the old and new versions as Version 1 and Version 2 respectively. 




Server 1 

Server 2 

Server 3 

Initially Ver. 1 available at all servers 

Ver. 1 

W\ 

W\ 

W\ 

Then, Ver. 2 reaches Servers 1 and 2 

Ver. 1 
Ver. 2 

W 2 

w 2 

W\ 


TABLE I 

Replication for n — 3, c = 2 with two versions. A server stores W\ if it receives only Version 1; it stores W 2 if it receives 
both versions. Two possible scenarios are shown in the table. Note that the latest version is decodable from every 2 servers in 
both scenarios. In general, it can be verified that the latest common version or a later version is decodable from every c = 2 

servers, in every possible scenario. The storage cost is 1 unit. 


Our main converse result shows that in the asynchronous setting that we study, the singleton 
bound is not tight and that the storage cost for any multi-version code cannot be close to K Our 
converse implies that there is an inherent, unavoidable cost of ensuring a consistent storage service 
because of the asynchrony in the system. 

For the setting described where there are two versions, we provide in this paper a code construc¬ 
tion that achieves a per server storage cost of f° r odd c. When c is even, we achieve a storage 

cost of Table III provides an example of our construction with storage cost of 3/4 unit, 

for n — 3, c = 2. Note that our construction outperforms replication and simple MDS codes. We 
provide in this paper a converse that shows that the worst case storage cost cannot be smaller than 
under some mild assumptions. The converse implies that our code construction is essentially 
optimal for odd values of c. 

In this paper, we study a generalization of the above problem. In a system with n servers and 
v versions, a multi-version code allows every server to receive any subset of the v versions. Every 
server encodes according to the versions that it received. The decoder takes as input, codeword 
symbols of an arbitrary set of c servers, c < n, and recovers the latest common version among 
these servers, or some version later. The storage cost is the worst-case storage size per server over 
all possible scenarios, that is, over all possible subsets of versions corresponding to the servers. In 
this paper, we provide an information-theoretic characterization of the storage cost of such codes, 
including code constructions and lower bounds for given parameters n, c, v. 

B. Background and Motivation 

The multi-version coding problem is characterized by two new aspects in its formulation: (i) the 
idea of consistency in the decoder, and (ii) the asynchrony in the distributed storage system. 
We describe some motivating applications and related background literature that inspire our 
formulation here. 

Storing multiple versions of the same message consistently is important in several applications. 
For instance, the idea of requiring the latest version of the object is important in shared memory 
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Server 1 

Server 2 

Server 3 

Initially Ver. 1 available at all servers 

Ver. 1 

Pl.P2 

P 3 ,P 4 

Pi ®_P 3 ,P 2 0 P 4 

Then, Ver. 2 reaches Servers 1 and 2 

Ver. 1 
Ver. 2 

Pl,P2 

91,92 

P 3 ,P 4 

< 23,^4 

Pi ®P 3 ,P 2 0 P 4 


TABLE II 

Simple erasure coding for n — 3, c = 2 with two versions. Assume each unit is 4 bits, and the bits of the two versions are 
Wi — (p1.p2.p3.p4). W2 = ( 91 , 92, 93, 94). Every version is coded with a (3, 2) MDS code where each codeword symbol is 
a 2-bit vector. The 3 codeword symbols for the 3 servers are (P1.P2). (P3.P4). ( pi ®P3,P2 ® P4) for Version 1, and 
( 91 , <72), (93,94), (qi ® 93, q 2 ® <74) for Version 2. A server stores its corresponding codeword symbol of Version 1 if it has 
only Version 1; it stores codeword symbols of both Version 1 and Version 2 if it receives both versions. Two possible 
scenarios are shown in the table. It can be verified that the latest common version is decodable from every c = 2 servers, in 
every possible scenario. The storage cost is 4 bits, or equivalently, 1 unit. 




Server 1 

Server 2 

Server 3 

Initially Ver. 1 available all servers 

Ver. 1 

Pl,P 2 ,P 3 

Pl.P 2 .P 4 

Pl.P 2 .P 3 

Then, Ver. 2 reaches Servers 1 and 2 

Ver. 1 
Ver. 2 

P 3 

P4 

93,94 

P1.P2.p5 


TABLE III 

Proposed code for n — 3, c = 2 and two versions. Assume each unit is 4 bits, and the two versions are Wi — (p1.p2.p3.p4). 
W2 = (# 1 , # 2 , # 3 , # 4 ). Here p$ = pi ® P2 ® P3 ® P4- Server 1 stores (p1.p2.p3) if it receives only Version 1 ; it stores 
(p3, qi, < 72 ) if it receives both versions. Server 2 stores (p1.p2.p4) if it receives only Version 1; it stores (p4, q3, 94 ) if it 
receives both versions. Server 3 stores (pi,P 2 ,P 5 ) if it receives only Version 1; it stores (p 5 , q 1 ® 93 , q 2 © 94 ) if it receives 
both versions. Two possible scenarios are shown in the table. It can be verified that the latest common version or a later 
version is decodable from every 2 servers, in every possible scenario. The storage cost is 3 bits, or equivalently, 3/4 unit. 


systems [1] that form the cornerstone of theory and practice of multiprocessor programming [8]. In 
particular, when multiple threads access the same variable, it is important that the changes made 
by one thread to the variable are reflected when another thread reads this variable. Another natural 
example comes from key value stores, for instance, applied to storing data in a stock market, where 
acquiring the latest stock value is of significant importance. 

Asynchrony is inherent to the distributed nature of the storage systems used in practice. In 
particular, asynchrony occurs due to temporary or permanent failures of servers, or of transmission 
between the decoders and the servers. Indeed, the default model of study in storage systems in 
the distributed algorithms literature assumes that communication links can have arbitrarily large 
delays [1]. Since it is more difficult to achieve synchronization in larger systems, asynchrony is an 
arguably justified modeling choice for distributed storage systems which are expected to scale in 
practice to cope with rising demands. 

The problem of storing multiple versions of the data consistently in distributed asynchronous 
storage systems forms the basis of celebrated results in distributed computing theory [4]. From 
a practical perspective, algorithms designed to ensure consistency in asynchronous environments 
form the basis of several commercial storage products [5], [9], [7], [6]. We refer the reader to [5] for 
a detailed description of the Amazon Dynamo key value store, which describes a replication-based 
data storage solution. While [4], [5] use replication-based techniques for fault tolerance, the idea of 
using erasure coding for consistency has been used in recent distributed computing literature [10], 
[11], [12], [13]. In fact, these references use the idea of simple erasure coding that we referred to in 
Section I-A. 

We note that the idea of storing versioned data has acquired some recent interest in information 
theory literature. In particular, some of the challenges of updating data in distributed storage 
systems have been studied in [14], [15], [16], [17]. These works complement our paper, and their ideas 
can perhaps be adapted to our framework to build efficient consistent data storage implementations. 
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C. Contributions and Organizations 

Multi-version coding provides an information theoretic perspective to the problem of ensuring 
a consistent data storage service over an asynchronous distributed storage system. Our problem 
formulation is geared towards optimizing the storage cost per server node. We describe the multi¬ 
version coding problem formally in Section II. In Section III, we formally state our main results: 
a multi-version code construction that has a lower cost compared to replication and naive erasure 
coding, and an information theoretic lower bound on the storage cost. The proofs of the main results 
are provided in Sections IV, V and VI. In Section VII, we demonstrate the utility of multi-version 
codes using a toy model of asynchronous distributed storage systems. We discuss related areas of 
future work in our concluding section, Section VIII. 

We describe our achievable multi-version code constructions in Section IV. The construction use 
a simple linear coding scheme without coding across versions. Moreover, our code construction 
satisfies a causality property (defined in Section II) that enables easier implementation, because 
our encoding strategy is agnostic to the order of arrival of the various message versions at the 
servers. 

In Section V and VI we prove lower bounds on the storage cost for v — 2 versions and arbitrary 
z/, respectively. Our lower bounds imply that our code constructions are essentially optimal for 
certain families of parameters, and are close to optimal in general. It is worth noting that our 
problem formulation allows for all possible methods of encoding the versions. In particular, servers 
can encode multiple versions together, and use possibly non-linear methods of encoding the data. 
The tightness of our converse shows that, perhaps surprisingly, encoding each version separately 
using linear codes is close to optimal. 

From a technical standpoint, the lower bound argument is interesting and challenging, especially 
when the number of versions v is larger than 2. This is because, in commonly studied settings in 
multi-user information theory, the decoder has a specific set of messages that it wants to recover 
reliably. In contrast, in multi-version coding, a decoder is allowed to recover any one of a subset 
of messages correctly. As a consequence of the relatively unusual decoding constraint, commonly 
used methods of deriving converses need to be modified appropriately to obtain our lower bound. 
We provide a more detailed discussion on the technical aspects of the converse in Section III. 

In Section VII, we describe a toy model of distributed storage that explicitly includes an arrival 
model for new versions and channels models for the links between the encoders, servers and 
decoders. We demonstrate the utility of multi-version codes in understanding the storage costs 
over the described toy model. Our study in Section VII provides a more refined understanding of 
the parameters of the multi-version coding problem in terms of the characteristics of a distributed 
storage system. Readers who are interested understanding the applications of multi-version codes, 
but not the technical details of the construction and the converse, can skip Sections IV-VI and read 
Section VII. 


II. System Model: Multi-version Codes 

We begin with some notations. For integers i < j, we use [i, j] to represent the set {i,i + l,... , j}. 
For integers i > j , we define [i, j] as the empty set. We use [j] to represent the set [ 1 , j]. And [j] is 
an empty set if j < 0. For any set of indices S = {s i, 82 ,..., s\$\} C Z where si < 82 < ... < s |g|, 
and for any ensemble of variables {Xi : i G S'}, we denote the tuple (X Sl ,X S2 ,..., X S|S| ) by X$- 
For a set {iq,..., v n } of elements, we use vs to denote the set {vi : i G S}. If S is empty, then vs is 
defined to be the empty set. For sets S C T, we write T—S to be the set difference {i : i G T, i ^ S}. 
We use log to represent log base 2. 

We now define the multi-version coding problem. We begin with an informal definition, and 
present the formal definition in Definition 1. The multi-version coding problem is parameterized 
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by positive integers n, c, z/, M and q. We consider a setup with n servers. Our goal is to store is 
independent versions of the message, where each version of the message is drawn from the set [M]. 
We denote the value of the ith version of the message by W% G [M] for i G [is]. Each server stores 
a symbol from [q]. Therefore, lo gq can be interpreted as the number of bits stored in a server. 
Every server receives an arbitrary subset of the versions. We denote S(i) C [is] to be the set of 
versions received by the ith server. We refer to the set S(i) as the state of the ith server. We refer 
to S = (S(l),... , S(n)) G V([is]) n as the system state , where V([is]) denotes the power set of [is]. For 
the ith server, denoting its state S(i) as S — S(i) = {s i, 52 , , s\$\} where si < S 2 < ... < S|g|, 
the ith symbol of the codeword is generated by an encoding function that takes an input, 
Ws = (W Sl , W S2 ,..., W S|S| ), and outputs an element in [q]. 

We assume that there is a total ordering on the versions: if i < j, then Wj is interpreted as 
a later version of the message as compared with Wi. For any set of servers T C [n], we refer to 
maxD^rS(i) as the latest common version in the set of servers T. The purpose of multi-version 
code design is to generate encoding functions such that, for every subset T C [n\ of c servers, 
a message W m should be decodable from the set T, where m > maxH ierS(i) for every possible 
system state. The goal of the problem is to find the smallest possible storage cost per bit stored, 
or more precisely, to find the smallest possible value of ^gM over possible multi-version codes 
with parameters n, c, v, M, q. 

We present a formal definition next. 


Definition 1 (Multi-version code) An (n, c, v. M, q) multi-version code consists of 

• encoding functions 

= iW [?], 

for every i G [n\ and every S C. [is], and 

• decoding functions 

4 T) : ->• m u {NULL}, 

for every set S G V{[v]) n and set T C [n] where \T\ = c, 
that satisfy 

4 T) 

{ W rn for some m > max D 2 GtS(z), if n ieT S(i) f 0 , 
NULL, o.w.y 

for every G [M] u , where T = {t\, £2, • • •, t c }, t\ < • • • < t c . 


( 1 ) 


Remark 1 Suppose M > is, and let S be the n-tuple server state. Consider servers T C [n], \T\ = c, 

( r r\ 

and the union of their states S' — U^TS(t). Then for any given tuple W^, the decoding function 
decodes either NULL, or a value that is equal to Wj, for some version j G S'. 

We normalize the storage cost by the size of one version, that is logM. 

Definition 2 (Storage cost of an (n, c, is, M, q) multi-version code) The storage cost of an (n, c, v, M, q) 
multi-version code is defined to be equal to . 

As mentioned in the introduction, replication, where the latest version is stored in every server, 
i.e., p^fiWsJ = kE max (s(p) incurs a storage cost of 1. An alternate strategy would be to separately 
encode every version using an MDS code of length n and dimension c, with each server storing an 
MDS codeword symbol corresponding to every version that it has received. Such a coding scheme 
would achieve a storage cost of 1 sjc, for sufficiently large q. 
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For parameters n, c, z/, the goal of the multi-version coding problem is to find the infimum, taken 
over the set of all (n, c, z/, M, q) codes, of the quantity: . 

It is useful to understand the connection of the parameters of the multi-version coding problem 
and the physical characteristics of a distributed storage system. The parameter n naturally repre¬ 
sents the number of servers across which we intend to encode the data of the storage system. The 
parameter c is connected to the failure tolerance; in particular, an (n, c, z/, M ) multi-version code 
can protect against n — c server failures since the latest common version is recoverable among any 
c nodes. In Section VII, we show through a toy model of distributed storage, that the parameter v 
is related to the degree of asynchrony in the system. 

Notice that in our definition, the encoding function of each server depends only on the subset of 
versions that has arrived at the server, but not on the order of the arrival of the versions. From a 
practical standpoint, it could be useful to modify the definition of multi-version codes to let the 
encoding function depend on the order of arrival of the versions. However, in this paper, we use a 
different approach. We introduce the notion of causal multi-version codes that obviates the need 
for incorporating the order of arrival in the definition. 

Definition 3 (Causal codes) A multi-version code is called causal if the encoding function satisfies: for 
all S C [z/], j G S', i G [n], there exists a function 

vfj ■■ [«] x m -> M, 

such that 

To understand the notion of causal codes, imagine that a sequence of versions arrive at a server in 
an arbitrary order. If a casual multi-version code is used, then the encoding function at the server is 
only a function of its stored information and the value of the arriving version. We anticipate causal 
multi-version codes to be more relevant to practical distributed storage systems than non-causal 
codes. In fact, we demonstrate the utility of causal multi-version codes in storage systems through 
our toy model of distributed storage in Section VII. All the code constructions that we present in 
this paper are causal. 


III. Main Results 

In this section, we formally present the main results of this paper: Theorem 1, which states 
the storage cost of an achievable code construction, and Theorem 2, which states the result of a 
converse that lower bounds the storage cost of an arbitrary multi-version code. We present and 
discuss Theorem 1 in Section III-A. We present and discuss Theorem 2 in Section III-B. 


A. Achievability 

Theorem 1 Given parameters (n, c, u), there exists a causal (n, c, z/, M, q) multi-version code with a 
storage cost that is equal to 

max 


U fr-i) n 

\ c tc ’ i J ’ 


where 


t — 


5H + 1, ifc>{v- l) 2 , 
l£r1 > ifc<{v- l) 2 . 


The achievable scheme of Theorem 1 has a strictly smaller storage cost as compared with 
replication and simple MDS codes. In particular, if v is comparable to c, our achievable code 
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constructions could improve significantly upon replication and simple MDS codes. If v — c— 1, our 
storage cost is approximately half the storage cost of the minimum of replication and simple MDS 
codes for large values of c. It is instructive to note that if v\(c — 1), the storage cost is c+ ^_ 1 . 

Our code constructions are quite simple since we do not code across versions. The main idea of 
our approach is to carefully allocate the storage “budget” of log q among the various versions in a 
server’s state, and for each version, store an encoded value of the allocated size. 

In [18], we studied a special case of the multi-version code that decodes only the latest common 
version. Here, we allow the decoder to return a version later than the latest common version. It 
is interesting to note that, under the relaxed definition of multi-version coding presented here, the 
converse of [18] is not applicable. In fact, the achievable scheme of Theorem 1 achieves a storage 
cost that is lower than the storage cost lower bound of [18] by exploiting the fact that a version 
that is later than the latest common version can be recovered. We plot the performance of Theorem 
1 in Fig. 2. 

B. Converse 

Theorem 2 A M, q) multi-version code with n > c + v — 1 and M > v must satisfy 

log <i > v _ logt^cyr 1 )) 

log M ~ c + v — 1 (c + v — 1) log M 

In the lower bound expression of Theorem 2, the second term on the right hand side vanishes as 
logM grows. For the case of v — 2 versions, we show a somewhat stronger result in section V. 

In particular, for v — 2, we show that the second term in the theorem can be improved to be 

logc 

(c+ 1 ) log M * 

The lower bound of Theorem 2 indicates that the storage cost, as a function of M, is at least 
v/(c + v — 1) + o(l). When v\ (c — 1), the storage cost of Theorem 1 approaches the lower bound 
of Theorem 2 as logM grows, and is therefore asymptotically optimal. The multi-version coding 
problem remains open when v /(c — 1). We establish a connection between the parameter v and the 
degree of asynchrony in a storage system in Section VII. The converse of Theorem 2 in combination 
with the achievable scheme of Theorem 1 therefore implies that the greater the degree of asynchrony 
in a storage system, the higher the storage cost. In particular, as v tends to infinity, the storage 
cost is one. Therefore, in the limit of infinite asynchrony, the gains of erasure coding vanish, and 
replication is essentially optimal. 

The assumption that log M grows while c, v are kept fixed is a reasonable first order assumption 
in our study of storage costs because, in systems where storage cost is large, the file size is typically 
large. The study of multi-version codes for finite M is, nonetheless, an interesting open problem. 

In the lower bound proofs, we develop an algorithm that finds a system state that requires a large 
storage cost per server to ensure correct decoding. Our approach to deriving the converse has some 
interesting conceptual aspects. The standard approach to derive converses for a noiseless multi-user 
information theory problem is as follows: (i) express the encoder and decoder constraints using 
conditions on the entropy of the symbols, (ii) use Shannon information inequalities to constrain the 
region spanned by the entropies of the variables, and (iii) eliminate the intrinsic variables of the 
system to get bounds that must be satisfied by the extrinsic random variables. Usually performing 
steps (i),(ii) and (iii) requires ingenuity because they tend to be computationally intractable for 
many problems of interest (see [19] for example). For the multi-version coding problem, we face 
some additional challenges since we cannot use steps (i),(h) and (iii) directly. 

To understand the challenges, we re-examine our approach to deriving a converse in [18], where 
the decoder was restricted to recovering the latest common version. For the problem in [18], the 
standard approach to deriving converses in multi-user information theory was applicable. In the 
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multi-version coding problem, note that for state S we may express the constraint at the encoder 
as 

log q > [„])), H(^ {i) (W W] )\W s(i) ) = 0, (2) 

for every i G [n\ and every possible state S(i) G P([z/]), and for any distribution on the messages in 
the system. 

In [18], where we constrained the decoder to decode the latest common version, we were able 
to similarly express the constraint at the decoder. For example, consider the first c servers and a 
state S where the latest common version is k G [v\ for the servers [c], we expressed the decoding 
constraint as 

= °- (3) 

Note that the above equation can be written for every possible state S. If we assume a uniform 
distribution for all the messages, we have H{W^) = v\ogM. Combining this with (2) and (3), and 
using Shannon information inequalities, we obtained a bound on ^ogM • 

In the problem we consider here, we can similarly write the constraints (2). However, in the multi¬ 
version coding problem, the constraint at the decoder cannot be expressed in a manner analogous 

to (3) because the decoder does not have a specific message to decode. At any given state of 

the system, a decoder that connects to c servers is allowed to decode one of several messages. In 
particular, imagine that version k is the latest common version for the servers [c], when the system 
state is S. Then the decoder is allowed to decode any one of H4,H4+i,..., W v for state S. In 
fact, one can conceive of a decoder that may return different message versions for the same state, 
depending on the message realization. For instance, one can conceive of multi-version code where, 
for a given state S, when the encoded message tuple is VF[^] = (HA, HA,..., HA), the decoder fig 
outputs HA, and when the encoded message tuple is = (HA, HA, • • •, HA), the decoder 
outputs Wk+i- As a consequence of the unusual nature of the decoding constraint, the converse 
proofs in Sections V and VI has an unusual structure. In particular, we carefully construct some 
auxilliary variables and write constraints on the entropies of the constructed variables to replace 
(3). Our approach to deriving converses is potentially useful for understanding pliable index coding 
problem and other recently formulated content-type coding problems [20], [21], where the decoder 
does not have a unique message, but is satisfied with reliably obtaining one of a given subset of 
messages. 


IV. Code Construction 

We describe our construction in this section. We start with code construction for v — 2 versions, 
and then generalize the construction for arbitrary v. In the end, we show that our construction is 
a multi-version code in Theorem 4. 

In our construction, each server encodes different versions separately. So that the total number 
of bits stored at a server is the sum of the storage costs of each of the versions in the server state. 
The encoding strategy at the servers satisfies the following property: Suppose that Server i is in 
state S C [i/\ and stores a[ S J logM bits of Version v, then Version v can be recovered from the c 
servers i\fi 2, ...i c , so long as 



Note that such an encoding function can be found for a sufficiently large value of q using standard 
coding techniques. In fact, suppose that the message W v is interpreted as a vector over some finite 
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(S) 

field. We let Server i store v log M random linear combinations of elements in the vector W v . Then 
Version v can be recovered from any subset of c servers satisfying (4) with a non-zero probability so 
long as the field size is sufficiently large. As a result, there exists a deterministic code that decodes 

Version v if (4) is satisfied. We also note that, in our approach, the storage allocation a[ S J only 

(S) 5 (S) 

depends on the server state but not on the server index. Therefore, we can write a\ v — ay for 
any Server i at a nonempty state S C [is ]. 

As a result, to describe our construction, we only need to specify the parameter for every 
possible server state S C [u\ and every v G S. That is, we only need to specify the information 
amount corresponding to Version v stored at a server in state S. We denote a = as the storage 

cost. Note that we have a = max^c^j a v^• 

Definition 4 (Partition of the servers) For every system state S where S (i) ^ 0 for all i G [n\, we 
define a partition of the n servers into v groups as follows. For i G [v\, Group i has the set of servers 
which have Version i as the latest version. 

For instance, if v — 2, Group 1 has the servers in state {1}, and Group 2 contains the servers in 
states {2} and {1,2}. 


A. Code Construction for v — 2 

We start by describing our construction for the case of v — 2 versions shown in Table IV. In 
Theorem 3, we show that our construction is a multi-version code. 


Construction 1 Define 


r c — l n 

t = r o i + r 


We construct a code for v — 2 with storage cost a = 2t , 1 . More specifically, we assign 


a 


({ 1 , 2 }) _ 


1 


= a — 


;,«i 


((!})_(V 2 » - .,({ 2 » - 


1 


= a, a 


, lx 2 


- G(n 


One can see that the code in Table III is an example that follows the above storage allocation. 
It is instructive to note that if c is odd, then a^ 1 ’ 2 ^ = 0, a^ 1,2 ^ = a = This means that if 
c is odd, each server simply stores l°g2 M bits of the latest version. That is, servers in Group 
1 store ^-j- log 2 M bits of Version 1, and servers in Group 2 store log 2 M bits of Version 2. By 
the pigeon-hole principle, there are at least servers either in Group 1 or Group 2; therefore a 
decoder can connect to any c servers and decode either version 1 or version 2. Furthermore, the 
decoder always obtains the latest common version, or a later version. As a consequence, our storage 
strategy forms a multi-version code. We next provide a formal proof next handling odd and even 
values of c together. 


Theorem 3 Construction 1 is an (n, c, v = 2, M, q) multi-version code with storage cost of for odd 
c, and for even c. 

Proof: Consider any set of c servers. We argue that the latest common version or a later version 
is decodable for every possible state. 

Case I. If the latest common version is Version 2, then all the c servers are in Group 2. Since we 
have c>t servers, and each server contains 1 ft amount of Version 2, Version 2 is recoverable. 
Case II. If the latest common version is Version 1, then the c servers may be in state {1} or state 
{1, 2}. If there are at least t servers in state {1, 2}, then we can recover Version 2. Otherwise, there 
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State 

{1,2} 

{i} 

{2} 

Ver 1 

a — 1/t 

a 


Ver 2 

i/i 


1/t 


TABLE IV 

Storage allocations for code construction with v — 2 versions. Note that t = |"^1] + 1, and the storage cost is a — 
More specifically, a = 1 /t for odd values of c and a = for even values of c. 


are at most t — 1 servers in state {1,2}, and at least c — t + 1 servers in state {1}. Thus the total 
amount of Version 1 in these servers is at least 

(c — t T 1 )ct T (t — — 1/t) = 1, 

so we can recover Version 1. ■ 


B. Code Construction for an Arbitrary v 

We generalize our constructions to arbitrary values of v. We first provide our constructions in 2 
and then prove in Theorem 4 that our construction is a multi-version code. 


Construction 2 Define a parameter t as follows. 


t — 


P^l+l, c>(v- l) 2 , 
fel, c<{y- l) 2 . 


(5) 


We construct the (n, c, z/, M, q ) code with storage cost 


log q 

log M 


max 


vt — v + 1 11 
tc ’ t J 


( 6 ) 


(S) 

For state S, the parameter af ’ is set as follows: 

• If Version j, j > % is the latest version in state 5, then — j , that is } store j log 2 M bits of 
Version j. 

• If Version j , j > % is the latest version in state 5, and {1} G 5, then, = a — that is, store 
(a — j) log 2 M bits of Version 1. 

• If Version I is the latest version, namely S — {1}, then a[ S ^ — a. That is, store alogM bits of 
Version 1. 


Note that in our construction, a server in Group j only stores encoded symbols of Version j and 
possibly Version 1. 

It is useful to note that if c > {y — l) 2 , then t>u and if c < {y — l) 2 , then t < v. 


Remark 2 It can be readily verified that the storage cost of Construction 2 can be expressed more 
explicitly as follows: 

\i ifc e [(i - l)v + 1, (y - l)t],t < v, 

^y +1 , otherwise. 

where t is defined as in (5). 

Remark 3 We note that when v\(c — 1), we have 

c + v — 1 



v 
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storage size for 5 versions 



Fig. 2. Comparison between the construction for the code in Construction 2, the code in [18], and the smaller of replication 
and the simple MDS code. We fix v — 5 versions and plot results for different number of connected servers, c. 



Group 1 

Group 2 

Group 3 

State 

{i} 

{2} 

{1,2} 

{3} 

{1,3} 

{2,3} 

{1,2,3} 

Version 1 

1/3 


0 


0 


0 

Version 2 


1/3 

1/3 



0 

0 

Version 3 




1/3 

1/3 

1/3 

1/3 


TABLE V 

Storage allocations for code with c = 7, z/ = 3,a = l/3. 


irrespective of whether c is bigger then {y — l) 2 or not. As a result, we have a = c+ "_ 1 when v\ (c — 1). 
In this case, as per Construction 2, a server in Group i stores c+ f / _ 1 log 2 M bits of version i, and does 
not store any of the older versions. A simple pigeon-hole principle based argument suffices to ensure 
that any decoder that connects to c servers decodes the latest common version among the servers, or a 
later version. 

In Figure 2 we show the storage cost of the construction with v = 5 versions, we can see the 
advantage of the proposed code compared to previous results. 

Table V is an example with c = 7, v — 3, a = 1/3. Notice in this case, v\ (c — 1), each server only 
stores information about the latest version it receives, and does not store any information about 
any of the older versions. It is easy to see that when connected to c = 7 servers with a common 
version, at least one version, say Version z, can be decoded from 3 servers in Group i using similar 
arguments as the proof of Theorem 3. 

Table VI is an example for t = 3, c = 5, v — 3, a = 7/15. In this example, the storage cost of 
the states are not equal, but one can simply treat the worst-case size as a. One can check that the 
above code recovers the latest common version, or a version that is later than the latest common 
version. For example, suppose the latest common version is Version 1. 



Group 1 

Group 2 

Group 3 

State 

{i} 

{2} 

{1,2} 

{3} 

{1,3} 

{2,3} 

{1,2,3} 

Version 1 

7/15 


2/15 


2/15 


2/15 

Version 2 


1/3 

1/3 



0 

0 

Version 3 




1/3 

1/3 

1/3 

1/3 


TABLE VI 

Server storage allocations for c = 5, v — 3, a = 7/15. 
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• If at least three of the c servers are in Group 2, then Version 2 is recoverable. 

• If at least three of the c servers are in Group 3, then Version 3 is decodable. 

• Otherwise, among the c servers, at most two servers are in Group 2, at most two servers are 

in Group 3, and at least one server is in state {1}. The amount of information of Version 1 in 

these c servers is at least 7/15 + 2/15 x 4=1, which implies that Version 1 is recoverable. 

Theorem 4 The code in Construction 2 is a casual (n, c, z/, M, q) multi-version code. 

Proof: To show a version is recoverable, it suffices to show that the total storage allocation 
for that version in the connected servers is at least 1. Let j be the latest common version among 
c servers. Note that there are at most v — j + 1 groups, since Group 1, Group 2, ..., Group j — 1 
are empty. 

Case I. When j > 2, there exists a group, say Group £;, with at least > |~^y~| servers. 

In our construction, each server in Group k stores j of Version k. To prove the theorem, it suffices 
to show that > £, since this implies that Version k is recoverable from the servers in Group 

k. When c < (z/ — l) 2 , then [p^y] = t. Therefore, we need to show this for c > (z/ — l) 2 . When 
c > (v — l) 2 , we have t — \^r] + 1. Therefore, we have c G [v(t — 2) + 2, v(t — 1) + 1]. Notice also 
that t > v. These imply the following. 


1 

z/(t — 2) + 2 

- 1 v — 1 1 


—t. 

Therefore, the theorem is proved for the case where j > 2. 

Case II. When j = 1 is the latest common version, if Group i has at least t servers for any 
2 < i < z/, then Version i is recoverable and therefore the theorem is proved. Otherwise, there are 
at most t — 1 servers in Group i, for all 2 < i < z/, each of which stores a — j size of Version 1; 
and thus at least c — (z/ — l)(t — 1) servers in Group 1, each storing a size of Version 1. The total 
storage cost for Version 1 in these servers is at least 

(a - I)(i/ — 1)0 — 1) + a(c -{y- 1 )(* - 1)). 

And by the choice of cp we know the above amount is at least 1. Therefore, Version 1 can be 
recovered. 

If a server is in State 5, Version j arrives, the server simply need to encode based on its stored 
information and information of Version j : (i) if j < max 5, the server does nothing; (ii) else the 
server removes 1/t amount of information of Version max 5, and replace it with 1/t amount of 
information of Version j. Therefore, the construction is causal. ■ 

From the above results, we can prove Theorem 1. 

Proof of Theorem 1; Since Construction 2 is a casual (n,c, z/, M, q) multi-version code by 
Theorem 4, and has storage cost as in (6), the theorem is proved. ■ 

In fact, the construction in this section is inspired by computer search for v — 3 and small 
values of c using integer linear programming on the allocated storage sizes. In particular, denote by 
S = (S(l),... , S(c)) a c-tuple server state. Define the latest common version m(S) = maxn^ =1 S(i), 
for n£ =1 S (i) ^ 0. We assume that the versions are coded separately, and use to denote the 







14 


storage allocation of Version v at a server in state S', for v G S. Then we have the optimization 


problem with respect to variables cqcq, . 

minimize cq (7) 

s.t. ai s) > 0, for all S G P([v]),v G 5, (8) 

< a, for all 5 € V([v\) (9) 

ves 

VC c 

V E «i S(0) > 1 , for all s G P(M) C , p| S(*) 7^ 0 (10) 

v=m(S) i =1 2=1 


where V is the “or” operator. In words, we want to minimize the storage size cq subject to the 
constraint thats the allocation sizes are non-negtive (equation (8)), every node stores no more than 
a (equation (9)), and the latest common version m(S), or a later version should have enough storage 
size to ensure recovery (equation (10)). 

We can use the Big M Method [22] to convert the “or” constraints in (10) to “and” constraints 
and solve it by integer linear programming. On application of the Big M method, our optimization 
problem (7) can be equivalently expresses as 

minimize cq 

s.t. > 0, for all S G V([v]),v G S, 

< cq for all S G V([v\) 

vES 

for all S G V{[u]) c , nf =1 S(i) 7^ 0 : 

C 

> 1 - y v ,Vv > m( S), 

2—1 

V Vv - v - m ( S ) 

v>m(S) 

0 < y v < 1, y v G Z, Vv > m( S), 

Plugging in small values of c and v — 3, one can obtain the constructed code as one solution to the 
above optimization problem. 

We would like to point out the low complexity to update information in the servers in our 
constructions. As the theorem states, our constructions are causal codes. Whenever a version arrives 
that is the latest among all received ones, the server only needs to delete (a part of) the older 
version/s and store the latest version. In addition, when v\(c — 1), no matter how many versions are 
in the server state, the server stores information about only the latest version. In this case, every 
server only manages a single version and has relatively low complexity compared to simple MDS 
coding scheme. 


V. Proof of Converse for 2 Versions 

In this section, we prove Theorem 2 for the case of v — 2 versions. The proof is the inspiration 
for the proof of general value of v in the next section. 

Consider any (n, c, 2, M, q) multi-version code, and consider the first c < n servers. We note here 
that an arbitrary set of c servers can be considered for the converse. We consider the first c servers 
without loss of generality. In particular, we let the server state be the empty set 0 if the server 
index is larger than c, and we always try to decode from the first c servers. 
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Informally, the main idea of our argument is as follows. We begin with the following claim: given 
the values of the two version, Wp ] = (Wl, kF 2 ), there exist two system states, Si, S 2 G V([v\) n such 
that 

• the states Si, S 2 differ only in the state of one server, say, Server A , and 

• W\ is decodable from the symbols stored among the first c servers in state Si, and W 2 is 
decodable from the symbols stored in the first c servers in state S 2 . 

However, notice that the encoded symbols of the servers [n\ — {A} are the same in both states 
Si and S 2 . This implies that both Version 1 and Version 2 are decodable from the following c + 1 
symbols: the c codeword symbols of first c servers in state Si, and the codeword symbol of the A- th 
server in state S 2 . Note that Si, S 2 , and A are chosen based on the values of VF[ 2 ], in fact, they 
may be viewed as functions of W^\ • 

We now construct c+1 variables Yj c ],Z as follows: Y\ is the value stored in the i-th server for 
i G [c], when the server is in state Si(i), and Z is the value stored in the A- th server when the 
server is in state S 2 (A). Notice that the variables Y^i G [c], Z all belong to [q\. 

Since these 2 versions, Wi, VF 2 , each of alphabet size M, are decodable from the c+1 auxilliary 
variables Yj c j,Z with an alphabet of size q, we need (c + 1) log q > 2log M + o(l). We provide a 
formal proof next. 

Formal Proof 

Let S be the set of system states 

5 ={S G V([v]) n : 

S(i) = {1,2}, Vi € [x], 

S(i) = {l},Vi G [x + l,c], 

S(i) = 0, Vi G \c+ 1 , n], 

Vx G [0, c]}. 

For given values of Wp] , we define two subsets of S according to the version decoded from Servers 
[c], denoted by <Si, <S 2 : for i = 1,2, 

Si = {S e <s : + [C]) (+( ) 1) (W / S( 1 ) ), • • •, 4t)(^s(c))) = Wi}. 

We can see that any system state in the set S has the following structure: for some x G [0, c], the 
first x servers have both versions, servers [x+ 1, c] have the first version, and the remaining servers 
have no version. Notice that for any system state in 5, there exists a latest common version among 
the first c servers. This means that for every state in 5, the corresponding decoding function must 
return Version 1 or Version 2. Thus, S\ U<S 2 = S. The subset Si is one where the decoding function 
returns H 7 ^, the value of Version z, from the first c servers, for i = 1,2. When W\ VF 2 , for any 
state in S the decoding function returns only one version, therefore <Si,<S 2 forms a partition of S. 
When Wi = VF 2 , for any state we can return both versions, so S\ = 5 2 = S. 

Claim 1 For any achievable (n,c, 2,M, g) code, and given values W[ 2 ], there are two states Si,S 2 G 
V([v]) n such that 

• The n-length tuples Si and S 2 differ in one element indexed by A G [c], that is, they differ with 
respect to the state of at most one of the first c servers. 

• Si G Si and S 2 G <S 2 . 

Proof: Assume W\ — VF 2 , then simply take Si such that their first c elements are all {1}, and 
the remaining elements are all 0. Take S 2 the same as Si except that the first element is {1,2}. 
They differ at index A = 1. One can easily check the conditions in the claim. 
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Assume W\ ^ W 2 . Consider a state with the smallest number, A, of occurrences of {1,2} in 
partition £2 and denote this state as S 2 . In other words, 

S 2 = arg min |{* : S(i) = {1,2}}|. 

Let Si be a state obtained by replacing the A -th element of {1,2} of S 2 by {1}. Notice that, since 
the number of occurrences of {1,2} in the state tuple Si is smaller than the number occurrences 
of S 2 , the state Si does not lie in partition S 2 . Furthermore state Si lies in S. Therefore Si lies in 
partition Si. It is easy to verify that states Si and S 2 satisfy the conditions of the claim. ■ 

Next, we define c + 1 variables Yj c ], Z. Denote by A the number of servers in {1, 2} for S 2 found 
by the proof of Claim 1, or the largest server index in state {1,2} for S 2 . Denoted by Yj c ] the values 
stored in the first c servers when the system state is Si: for i G [c], 

* = 

Denote by Z the value stored in the A -th server when the server state is S 2 (A) — {1,2}: 

z 

Proof of Theorem 2 for v — 2; Consider any (n, c, v — 2, M, q ) code. Given the value of the 
variable A , we can determine the two states Si,S 2 as in Claim 1. Therefore, if we are given the 
values of A, Yj c ] and Z, we can determine the values of W^y 

wi = 4 C] Aw), 

W 2 = 4?(X[A-i],Z,Y [A+ltC] ). 

Therefore, there is a bijective mapping from (Yj c j,Z, A) to Wp]- Therefore, the following equation 
is true for any distribution over W^\ 

H(W [2] \Y [c] ,Z,A)=0. 

Therefore, 

I(Y [c] ,Z-,W [ 2] \A) = H(W [2] \A) = H(W [2] ) - I(w [2] -A) > H(W [2] ) - logc. 

The last inequality holds because the alphabet size of A is at most c. We have the following chain 
of inequalities: 

(c+ 1) log q 
>I(Y [c] ,Z-W m \A) 

>H(W [2] ) - logc. 

The first inequality follows because Y t , Z belong to [q] for every i G [c]. Since the code should work 
for any distribution of VF[ 2 ], we assume that VFi, W 2 are independent and uniformly distributed 
over [M], Then the theorem statement follows. ■ 

It is instructive to observe that in the above proof (and similarly the proof for general v of 

Theorem 2) that, for different values VF[ 2 ], the parameters A, Yj c ],Z may take different values. If 

(T) 

we constrain the multi-version codes so that the decoding function ; in Definition 1 returns a 
fixed version index m given the system state S and the set of connected servers T, T C [n], \T\ = c, 
then the lower bound can be strengthened [23]. Our formulation converse proof here is applicable 
even for multi-version codes where the decoded version index m depends not only on S and T, but 
could also depend on the values 
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VI. Proof of Converse for an Arbitrary v 

In this section, we provide a proof of Theorem 2 for arbitrary values of v. Given an (n, c, z/, M, q) 
multi-version code, we can obtain a (c, c, z/, M, q) multi-version code by simply using the encoding 
functions corresponding to the first c servers of the given (n, c, z/, M, g) code. Furthermore, the 
storage cost of the (c, c, z/, M, g) multi-version code is identical to the storage cost of the (n, c, z/, M, q) 
multi-version code. Therefore, to derive a lower bound on the storage cost, it suffices to restrict 
n to be equal to c. Consider an arbitrary (c, c, z/, M, q) multi-version code. Consider the set W of 
message-tuples whose components are distinct, that is, 


w = {W [v] : Wi ? Wj if i ^ jj. 


( 11 ) 


Denote by 1 w[i/]eW indicator variable: 




1, if G W, 
0, o.w.. 


For a given multi-version code, we construct auxilliary variables Yj^j, where Zj G 

[g],z G [c — 1],j G M, 1 < Ai < ••• < A v < c, and a permutation II : [v\ -A [z/], such that 
there is a bijection from values of W to (Yj c _i], II). In particular, we describe a mapping 

AuxVars from values in W to (Y[ c _;l], A^j, II) in Section VI-A, and prove in in Section VI-C 
that AuxVars is bijective. 

Consider an arbitrary probability distribution on then the bijection implies that 

| w ^,= i) = | A^]>n>% h€w = i) = o (12) 


If we assume a uniform distribution on the elements of , then the converse of Theorem 2 follows 
from the following set of relations. 


lo g (q^( C + l ^!) 

> H(Y [c _ lh Z [v] ,A [v] ,U | tw[u]ew = 1) 

= I(Y [ C -| 1 w[v]ew = 1 ) 
= H(W[ U ] | 1 w[v]ew = 1 ) 

= log |W| 


where the last inequality follows when M > zq and for the first inequality we use the fact that 
Yi,Zj G [q], there are at most v\ possibilities for II, and at most ( c+ ^ _1 ) possibilities of A^j. This 
implies that 


lQ gg > y _ gCA *) 

log M ~ c + v — \ (c + v — 1) log M 


(13) 


as required. 

To complete the proof, we describe the mapping AuxVars in Algorithm 1 in Section VI-A, and 
show in Section VI-C that AuxVars is bijective. Section VI-A describes some useful properties of 
Algorithm 1. 
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A. Algorithm Description 

The function AuxVars which takes as input, an element from W and returns variables 
Yj c _ip A\y\ is described in Algorithm 1. The algorithm description involves the use of a set 
valued function y, which we refer to as the decodable set function . We define the function next. 
The decodable set function is characterized by the following parameters: 

• a positive integer l < c, 

• a subset T of versions, T C [z/]; 
it takes as input, 

• l states S u S 2 ,...,Si G V([is]), 

• and messages G [M ] u , 

and outputs a subset of [M]. Recall that the decoding function 7 /^ returns NULL if there is 
no common version among the servers [c] in state S. The decodable set function is denoted as 
Xi\t(Si, W[ u ]) and defined as 


Xi\t(S u S 2 ,..^Si,W W] ) 

= Lp\x 1 ,X 2 ...,X c ) : 

S' = (S'!, • • •, S h S' l+1 , S'c), VS'j C T, j € [l + 1, c], 

X m = <p Sm (W Sm ), 1 <m<l 

X m = VS'JWs'JJ + 1 < m < c\ - {NULL}. (14) 

To put it in plain words, the decodable set xi\T is the set of all non-null values that the decoding 
function can return, given the states of the first l servers, the message realizations of when 
the states of the last c — l servers are restricted to be subsets of T. It is instructive to note that 
Xi\t(Su £ 2 ?. • •, S'/, W[^]) is a subset of {Wi : i G [z/]}, as stated in the next lemma. 

Lemma 1 For every positive integer l G [c], set T C [u] and states S 2 ,..., Si G V([t/]), we have 

Xi ]T (Si,S 2 ,...,S l ,W [u] )C{W l :ie[u}}. 

Proof: For every collection of c — l states S' l+V 2 ,..., S f c C T such that the state 

S ' = (S , 1 ,5 2 ,...,5 z ,SJ +1 ,SJ +2 ,...,5') 

has a common version, the decoding function returns a message value that was encoded. Since 
the encoded message is the decoding function returns an element in {Wi : i G [a]}. 

If there is no collection of c — l states S^ +1 , S ' l+2 ? such that the state 

S ’ = (S 1 ,S 2 ,...,S l ,Sl+ 1 ,Sl +2 ,...,S' c ) 

has a common version, the decoding function ^g/ returns NULL. In this case, the decodable set 
function returns an empty set, which is a subset of {Wi : i G [is]}. Therefore Xi\t(Si , S 2 ,..., Si, W^) 
is always a subset of {Wi : i G [is]}. ■ 

The following property is useful in our description of Algorithm 1. 


Lemma 2 Consider messages W^ that have unique values, that is, W^ G VV. Then, for any element 
IT G Xi\t{S\ 1 S 2 , ..., Si, IT^j), there is a unique positive integer m G [is] such that W m = IT. 
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Proof: The lemma readily follows from noting that there is a one-to-one correspondence between 
[is] and and that every element W in xi\t is also an element in by Lemma 1. ■ 

The decodable set function has an intuitive interpretation when has unique values, that is, 
when G W. If , S^W^) — {Wi : i G T} is non-empty, then, loosely speaking, 

this implies that the first l servers contain enough information for at least one message in [is] — T. 
This is because the decodable set function restricts the state of the last c — l servers to be from T; 
as a consequence, if it returns a value corresponding to a version in [is] — T, then the first l servers 
must contain sufficient information of this version. 


Algorithm 1 Function AuxVars: Takes input G W, and outputs variables T[ C _ X ], II. 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 
23 


24: 


AuxVarsfW^]) 

Initialize VerCount <— 1 

Initialize ServCount G- 1 

Initialize set VersionsEncountered G- {} 

Initialize Yj G- 1, G- 1, G- 1, j G [c], k G [is]. 
while VerCount < is and ServCount < c do 
S (ServCount) G- [is] — VersionsEncountered 
T ^ [is] — VersionsEncountered. 

U G- {W u : W u G x s ervCount \T—{u} (S(l), S(2),... , S (ServCount), ) ,iiGT} 

if U / 0 then 

W G- max U > Natural ordering on [M] for max 

Let v G [is] such that W v = W. > From Lemma 2 there exists a unique v 

Account <- ServCount 
Zv„Co. <- 4 S ("^”-)< W 'S(S»>C«)) 

II (VerCount) G- v 

VersionsEncountered G- VersionsEncountered U {?;} 

VerCount G- VerCount + 1 
else 

^ServCount (ServCount) (^S (ServCount)) 

ServCount G- ServCount + 1 

end if 
end while 

If II is not a permutation of [is], set II to be an arbitrary permutation. > As a consequence of 
Lemma 4 and Property (5), this line is never executed. 

Return T| c _ij 5 ^\y\ ? -^-[G] ? H 


In Algorithm 1, we describe the function AuxVars that takes as input G W and returns 
(Y[ c _;l], A[^], Z[^], II). Here, we informally describe the algorithm and examine some properties. 

In every iteration of the while loop of Algorithm 1, either VerCount increases by 1, or ServCount 
increases by 1. In particular, if Line 10 returns true, then VerCount increases by 1, otherwise 
ServCount increases by 1. Therefore, the while loop terminates, and as a consequence, the algorithm 
terminates. In our subsequent discussions, we identify an iteration of the while loop by its unique 
VerCount-ServCount pair at the beginning of the iteration. 

Every iteration of the while loop begins by setting the server state S (ServCount) in Line 7. If 
Line 10 is false, then the iteration sets IservCount i n li ne 19 and then increments ServCount. If Line 
10 is true, then the iteration sets AyerCount? AerCount and n(VerCount) respectively in Lines 13,14,15, 
and then increments VerCount. In particular, Ay er c 0U nt is set to the server index ServCount, and 
n(VerCount) is set to the version index VerCount. Note that A\ is the smallest value of ServCount such 
that Line 10 returns true, that is, it is the smallest integer such that Xa 1 \[v\-{u}{[ 1 ']^ , [A H^]) 
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Iteration 1: 
U = 0 


Iteration 2: 
maxi/ = W 2 

n(i) = 2 , 

A 1 = 2 


Iteration 3: 
U = 0 


Iteration 4: 
U = 0 


Iteration 5: 
maxi/ = l4/ 3 
11 ( 2 ) = 3 , 
A 2 =4 


Iteration 6: 
maxi/ = 144 

n(3) = 1 , 
/i 3 = 4 


ServCount= 1, VerCount= 1 

Server 12 3 4 

Ver 3 
Ver 2 
Ver 1 

Yi 


ServCount=2, VerCount=l 
Server 12 3 4 

Ver 3 
Ver 2 
Ver 1 

Yi Z ± 




ServCount=2, VerCount=2 


Server 

Ver 3 
Ver 2 
Ver 1 


2 


ServCount=3, VerCount=2 

Server 1 

Ver 3 ESS 
Ver 2 If] 

Verl Iw 

Y ± 

2 3 4 

y 2 y 3 

ServCount=4, VerCount=2 

Server 1 

Ver 3 

Ver 2 II] 

Verl vyy 

^ ^ ^ 

Yi 

y 2 y 3 z 2 

ServCount=4, VerCount=3 

Server 1_ 

Ver 3 

Ver 2 In 

Verl vyt 

Yi 

2 3 4 

Y 2 y 3 z 3 


ServCount=4, VerCount=4 




S8 

min 


Zi 



Fig. 3. Example of the algorithm, c — 4, v — 3. The resulting server indices are = (2,4,4), and the permutation on the 
versions is II = (2,3,1). 


contains W u for some u G [v]. Intuitively speaking, A\ is the smallest integer such that the 
first Ai servers have enough information about some version v E [v\, when the states of these 
servers are all set to [is\. If more than one version in [i/\ returns true for the iteration with 
ServCount = Ai, then v is picked to be the version index corresponding to the maximum value of 
{W u : xa 1 \[u\-{u}{[ 1/ \^ Mj • • •, W\y\) contains W u }. The iteration sets Il(VerCount) to the version 

index v. 

In Figure 3, we show an example of a possible execution of the algorithm for v — 3, c = 4 that 
happens to halt at ServCount A v — 4 for a particular multi-version code and message tuple . The 
states of the servers are set one by one to {1, 2, 3} as in Line 7. The algorithm proceeds incrementing 
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ServCount in every iteration where Line 10 returns false. In an iteration where Line 10 returns true, 
VerCount is incremented. Suppose at ServCount = 2, Line 10 returns true for the first time in the 
execution, and suppose that v = 2 in Line 12; the algorithm sets 11(1) = 2, Ai = ServCount = 2, 
and, in the next iteration, the state of the second server is reset to [v] — {11(1)} = {1, 3}. Then the 
algorithm proceeds incrementing ServCount every time Line 10 returns false, setting the state of 
the corresponding server to {1,3}. Now, suppose that Line 10 returns true at ServCount = 4, and 
that v — 3 in Line 12. Then 11(2) = VerCount = 3, A 2 = ServCount = 4. In the next iteration, the 
state of server 4 is set to {1}. Then A 3 is set similarly. 

We later show that from the variables Yj 3 p Z[ 3 ], A[ 3 ], II, one can recover all of the 3 versions 
Here we provide an informal overview of the argument. From Iteration 6 in Fig. 3, we can observe 
that W\ is equal to (Yf, Y 2 , P 3 , Z 3 ), where the decoding function is evaluated with states 
Si — {1,2,3}, S 2 = S 3 = { 1 , 3 },S 4 = {1}. The states Si,...,S 4 can be inferred from A^ and 
n. Similarly, from Iteration 5 in Fig. 3, we can observe that W 3 is equal to ^(Yi, > 2 , Y 3 , Zf) with 
states Si = { 1 , 2 , 3 },S 2 = S 3 = { 1 , 3 },S 4 = {1,3}. In the converse proof, we will show that, given 
Wi, W 3 , the value W 2 can be recovered by using the conditional decoding function as the maximum 
value of the set x 2 |{2 3 }({1,2, 3}, { 1 , 2 , 3}, W\) — {W}, W 3 }, which can be evaluated using the values 
W 1 ,W 3 ,Y 1 ,Y 2 ,Z 1 as 

L ( Jf ]) (X 1 ,X 2 ,X 3 ,X 4 ) : 

S' = ({1,2,3}, {1,2,3}, S' 3 , S' 4 ),VS'j C {1,3 },j G [3,4], 

X 1 =Y 1 ,X 2 = Z 1 , 

X m = (WsJ,m = 3,4 j - {W u W 3 , NULL}. 

In particular, we will show that the above set is a singleton set, with the element being W 2 . 


B. Properties of Algorithm 1 

We next list some useful and instructive properties of Algorithm 1 before proceeding to formally 
prove that ConverseAuxilliaryVars is invertible. 

Property (1) If the last iteration of the while loop begins with VerCount = z/, the server indices 
satisfy 1 < A\ < • • • < A v < c. Moreover, for any t < z/, every iteration of the while loop with 
ServCount — At — 1 has VerCount < t. Later, in Lemma 4, we show that in every execution of 
Algorithm 1, the last iteration of the while loop indeed begins with VerCount = v. 

Property (2) For the iteration of the while loop with VerCount and ServCount, the server states are 
set by the algorithm to be 


( [z/], i G [1,^1 - 1],A 1 > 1, 

S(0 = < M - {n(x) : X G [m]}, i G [A m , A m+1 - 1], m G [VerCount - 2 },A m+1 > A m , 

[ [u\ — {n(x) : x G [VerCount — 1]}, i G [Ay er count-i 5 ServCount]. 


( !5) 

Property (3) For the iteration of the while loop with VerCount = j, j > 2, and ServCount = fc, let 
S(i) be the state of Server i, i G [k]. Let t < j. Consider the last iteration with ServCount 
— A t — 1. Suppose that, in this iteration, VerCount = x , and S(z) is the state of Server i, 
i G [A t — 1]. Then for i G [A t — 1], the states S(i) are the same as the sates S(z). 

Property (4) At the beginning of an iteration of the while loop, the set VersionsEncountered is 
{n(l), ... , n(VerCount-l)}. The set T indicates the set of versions not encountered, which is 
[v\- {n(l),. .., n(VerCount-l)}. 
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Property (5) In Line 9, U C {Wifi G T}, where T = \u\ — VersionsEncountered. Therefore, when 
Line 10 returns true, U D{Wi : i G VersionsEncountered} = 0. As a consequence, Il(VerCount) 
{11(1),... , II(VerCount — 1)}. Therefore, if the last iteration of the while loop begins with 
VerCount = v and Line 10 returns true in this iteration, then II is indeed a permutation at the 
end of the iteration. 

Property (6) Consider the last iteration of the while loop of the algorithm where Line 10 returns 
true. Suppose this iteration begins with VerCount = j, ServCount = Aj. Then we have for all 
ieiAj-l], 

where S(i) is specified in Property (2) with VerCount = j, ServCount = Aj. Moreover, for all 

i e [j]> 

where S'(A;) = (II(i),..., II(i/)}. 

Property (7) Note that when Line 10 returns true, even though is an input to the function 
XservCount| T— {it} i n Line 9, for every wgT, we can generate the output of the function only from 
S(l),..., S(ServCount), {Wi : i G T — {?/}}, Y m = ¥>s(™)(^S(m))> 1 < m < ServCount — 1, and 

^ServCount = ¥ 5 S(ServCount) (ServCount)) • ^ particular, 

XservCountl T-{u} (S(l), S(2),..., S(ServCount), W [u] ) 

= |v4 cD (VW 2 ... Wc) /NULL: 

S ; = (S(l), • • •, S (ServCount), , S' c ),VS'j CT - {u},j e p + l,c], 

X m y m , 1 < rn < ServCount — 1 
X m = Z m , m — ServCount 

X m (VP^), ServCount + 1 < rn < c|. (16) 

We use this property in showing that AuxVars is one-to-one. 

We next state Lemmas 3 and 4. Statement (i) of Lemma 3 is useful in proving Lemma 4. Statement 
(ii) of Lemma 3 is useful in the proof of Theorem 2, in particular, in inverting AuxVars to obtain 
from Y[ c _ 1 ], II. Lemma 4 shows that at the beginning of the last iteration in the 

algorithm, the variable VerCount is equal to v. This means that all v versions returns true at Line 
10 in some iteration of the while loop. 

Lemma 3 (i) Consider any execution of AuxVars and consider an iteration of the while loop. After Line 
8 is executed in the while loop, the following statement is true for any u G T: 

XservCoun,\T-{u} (S(l), S(2),... , S (ServCount), W[ v ]) c {Wi,i € T}. (17) 

where ServCount represents the value at the beginning of the while loop iteration. 

(ii) Consider any execution of AuxVars where the final iteration of the while loop begins with VerCount = 
k . For any t G [k\, we have 

{W n( t)} = XA t |T-n (t) (S(l),... S (At), W [u] ) -{Wilier- (II(t)}}, 

where, T = (II(t), II(i + 1 ),..., II(//)}, and S is defined as Property (2) with VerCount = t, ServCount 
= A t . 
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Proof: (i) Note that T = [v\ — VersionsEncountered, and VersionsEncountered = {11(1), 11(2), 
..., n(VerCount — 1)} by Property (4). Notice that when VerCount = 1, the claim is satisfied 
automatically because T = [i/\. 

We prove by contradiction. We suppose at ServCount = fc, fc E [c], and VerCount = j, j E [2,z/], 
equation (17) is violated, and 


Wn (t ) e Xk\T-{u} (S(l), S(2),..., SO'), W[u ]), (18) 

for some t E [j — 1]. Let S be the state vector of length ServCount specified in Property (2), and 
T — = [u\ — |II(1),... , U(j — 1)} — {?/}. By the definition of decodable set function, there exists 

a state S' that decodes Tl(t) from the first c servers, 

4 cl) (<4(i)0*W> • • • > = W m< 09) 

such that 


S'(i) 


= S(i), 

cH-{n(i),...,n(j-i)}-M, 


i E [fc], 
i E [k + 1, c]. 


( 20 ) 


If A t = 1, then by Property (2), S'(i) C [i/\ — {11(1),... , II(t)} for all i E [c\. By Remark 1 
in Section II, we know should return a value corresponding to a version in U^ G [ c ]S'(i) C 

[u\ — {11(1),... , II(£)}, which contradicts (19). So we assume A t > 1. 

Consider the last iteration with ServCount = A t — 1. Let VerCount — x in this iteration. By 
Property (1), we know that x < t. We will show that if (18) holds, then, in this iteration, Line 
10 returns true. Therefore, in this iteration, VerCount E- x + 1 and ServCount = A t — 1 remains 
unchanged. This contradicts our assumption that the last iteration with ServCount = A t — 1 has 
VerCount = x. So, to complete the proof, it suffices to show that Line 10 returns true in this 
iteration. We show this next. 

For the iteration of the while loop with ServCount — A t — 1 and VerCount = x, 

• let S be the server states as in Property (2), 

• let T = [v\ — {11(1),... , H(x — 1)} = {II(x),... , n(z/)} be the set in Line 8, 

•let U = ( w u : W u <E X Se rvCount|f-{«} (s(!)» S(2), • • •, S(ServCount), Wpj , u G T 1 ) be the set in 
Line 9. 

Here note that H(t) E T because x < t by Property (1). By Property (2) and Property (3), and 
(20), we note that 


s'(») 


= S(z) = S(i), 

cf-n(t), 


i E [At — 1], 
i E [A u c]. 


( 21 ) 


Combining (21) and (19), we know that H(t) is in the decodable set function at VerCount = t and 
ServCount = A t — 1 using the state S', that is, 


Wn(t) C XA t -l\f-{U(t)} ®(2)’ • • • 5 S(^-t 1)5 ^[y]) 5 

which combined with the fact that n ft) E T implies U ^ 0 and Line 10 returns true. Thus VerCount 
x + 1 and ServCount — A t — 1 should stay unchanged. Hence we get a contradiction. 

(ii) Consider the iteration when n(VerCount) is set in Line 15, which has VerCount = £, ServCount 
= A t . After Line 8 is executed, we have T = {n(t), H(t + 1), ... , n(z/)}. We know Line 10 returns 
true, and by Line 11, lL n (t = maxi/. Thus letting u — n(t) E T, we have 

w u e XAIT-M (s(l), S(2),..., S (At), w [v] ) . 
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Moreover, W u f. {Wi : i G T — {u}}. Combined with statement (i), we obtain the desired statement. 


Lemma 4 In any execution of Algorithm 1, at the beginning of the final iteration of the while loop, we 
have VerCount= v, and Line 10 returns true in that iteration. 

Proof: We prove the lemma by contradiction. Suppose that, in an execution of Algorithm 1, the 
last iteration begins with VerCount = j, for some j < v. This means we have found a set of versions 
in the set VersionsEncountered such that Line 10 is not satisfied for VerCount = j. Furthermore, 
knowing VersionsEncountered = {n(l),... , II(j — 1)}, we note that Line 10 is satisfied for version 
n(i), when ServCount is equal to A\ for i G [j — 1]. 

Consider the last iteration of the while loop, VerCount = j, ServCount = c. The states of the first 
c servers are shown in Property (2). Note that there exists a latest common version among these 
c server states, which is max([z/] — (11(1),... ,II(j — 1)}). Therefore, the decoding function 
returns a non-null value. Specifically, for some u G [z/], we have 

4 [c hA(i)( w H)^s(2)( w M)’ • • • >v4t)(ww) = w - 

Furthermore, because ServCount = c, we have 

XservCount|T—{ii} i S (ServCount), 

= ... ,4tc)(^M))} = WJ 

The above result combined with statement (i) of Lemma 3 implies that the set U 0 and Line 10 
is satisfied, which contradicts our assumption. This completes the proof. ■ 


C. Proof of Theorem 2 


Now we are ready to prove the lower bound in Theorem 2. 

Proof of Theorem 2 for general v: Suppose there is a (c, c, z/, M, q) multi-version code. Run 
Algorithm 1 on every z/-tuple distinct version values Wyy By Lemma 4, we know the algorithm 
terminates with VerCount = v and 1 < A\ < • • • < A v < c. We use a dummy variable Aq = 1. 
First, since the algorithm is deterministic, we know there is a mapping AuxVars from G W 
to Yj c _;j_], Z[^j, A^, II. Next, we show that AuxVars is a one-to-one mapping, that is, we create a 
mapping from Yj c _i], A M , II to G W. 

We now describe how to obtain G W from the output of AuxVars. In particular, for any t G 
{1,2,..., z/}, we describe a procedure to obtain W n (£) from Yi, > 2 ,..., Ya t , Z t and VF n (t+i)? W^n(t+2)? 
..., Wn^y The procedure automatically implies that we can obtain from Y[ c _ 1 ], and 

II. For any realization of distinct values G W, if we are given II, we can set the state to 
be 


M - {n(i),... ,n(i - 1)}, * e A-iAi - i],Vj e [t], 
M-{n(i),...,n(t-i)}, i = A t . 


By Property (2), the above states are the same as the states in Algorithm 1 in iteration of the while 
loop with VerCount = £, ServCount = At. Note that at that iteration, by Property (6) we know 
Y [At -i],Zt are the values of Servers [A*], which corresponds to the above states S(l),... , S(A*). 
That is Yi = for i G [A t - 1] and Z t = ips(A t )(W^). Let T =[ v\- {11(1), • • • ,n(i- 1)}. 

Thus, Wx-{u(t)} = tF{n(t+i),...,n(i/)}* From Lemma 3 (ii), we know that 

{W n(t) } = XA t \T-m)} (S(1), s(2),..., s (At),w M ) - {Wi,i <et - n(f)}. 
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Servers 

Fig. 4. System architecture of a toy model with one write client and many read clients. 

Therefore, to obtain Wn(t), it suffices to evaluate the set 

XA t \T-m)} (S(l), S(2),... , S(^), w w] ) — {Wi, i & T — U(t)}. 

Property (7) states that the above set can be computed using Y^ t _^ Z t , W^_{n(t)} via equation 
(16). Therefore, we can compute Wn(t) as 

{Wn(t)} = 

= (4' Cl) (VW 2 ...,X c )^ NULL : 

S' = (S(l),..., S(A t ), S' At + 1 , • • •, 5' c ), V5"i C T - (n(i)}, j e [l + 1, c], 

X m = Y m , 1 < m < A t - 1 

X m = , 172 = V, 

Vn = + 1 < TO < c| 

-{Wi :ieT-{II(i)}}. 

Therefore, given W{n(t+i),...,n(i/)}? ^pu-ip II, we see that the value of ITn(t) is determined. 

Noting that < c for all i G [v] by Lemma 4, we infer that, given Yj c _i], II, we 

can determine values of Wn(i/-i)) • • • ? W^i) one by one. Hence we have a mapping from 

Yjc-i], Z[j,], A[^], n to G W, implying that AuxVars is one-to-one. Then we know (12) and 
hence (13) are satisfied, thus the theorem is proved. ■ 

Remark 4 IfWy is not uniformly distributed, then the proof of Theorem 2 can be appropriately modified 
to obtain a lower bound on the storage size per server using (13): 

C + V 

The above bound can be much smaller compared to the one in Theorem 2, if for example, the versions are 
dependent and have some structure. In particular, the storage cost lower bound would be much smaller 
if, because of the dependency of the versions, H{W^\tw [u] eW=i)) << log |W|. In this case, it is an 
open problem to study codes that exploit the dependency amongst versions to obtain a smaller storage 
cost. 


VII. Toy Model of Distributed Storage 

The multi-version coding problem has no temporal aspect in its formulation. Here, we study a 
toy model of storage system that evlolves over time and demonstrate the potential of multi-version 
codes for an asynchronous setting. In particular, we explicitly describe an arrival model for new 
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versions and channel models for the links between the encoders, servers and decoders. Our study 
of the toy model establishes a physical interpretation for the parameter v of multi-version coding. 
In particular, our toy model demonstrates the connection between the parameter v and the degree 
of asynchrony in a storage system. We begin with a description of our model. 

Consider a distributed storage system with N servers, a write client that generantes different 
versions of the message, and read clients that aim to to read the message versions (See Fig. 4). The 
write client aims to store the new version of the message in a distributed storage system consisting 
of the servers. We aim to design server storage strategies that implement a consistent distributed 
storage system, and are tolerant to / server failures. We now describe toy models for the channels 
between the clients and the servers, and the arrival model at the clients. 

Message arrival model: In the toy model here, we assume that a new version of the message 
appears at the write client in every time-slot. The message at time slot t is denoted as Wt, t E N + , 
where Wt E [M]. At time slot £, the write client sends a packet containing time stamp and the 
message, that is a packet with (£, Wt), to every server. 

Channel Model: We describe two models for the channels between the clients and the servers. 
The first model is a delay based model, and the second model is an erasure based model. 

Delay model: We assume that a transmitted packet sent by the write client at time slot f to a 
server arrives at the server in any one of the time slots {£,£ + l,...,£ + T — 1}. In other words, the 
sent message has a delay that can be any one of 0,1, 2,... , T — 1. The delay is not known a priori, 
and can be different for different packets. It is useful to note that even if the same packet is sent in 
the same time slot to different servers, the delay can be different for different packets. 

Note that in the message arrival model, for every time slot t, there is a message (£, Wt) sent to 
every server. Let Sm C {1,2,..., t} denote the set of versions received by server m at time t. Then 
in the delay model 

t- 1 

= ({t - T + 1} U i$) - |J S%\ (22) 

3 = 1 

where Rm is some arbitrary subset of {t — T + 2,..., t}. The arrival time set Sm at the servers are 
not known apriori. Our model is adversarial, that is, we want our decoding constraints (specified 
below) to be satisfied for all possible arrival time sets Sm that are of the form (22). 

Erasure Model: In the erasure model for the channel, a packet sent by the write client to the server 
may be erased. There is no packet delay in the erasure model. Our erasure model is adversarial, 
with the following packet delivery guarantee: for every subset of N — f servers, for any consecutive 
T packets, there is at least one least one packet such that it arrives at all the N — f servers. 
Mathematically, the received packet versions Sm at server m at time t is 

s (t) = f {f} if 3ni, n 2 , • • •, n N - f -1 G [ra] s.t. [fRt-T+i n flL"/” 1 s nJ = <t> (2 3) 

| {t} or </) otherwise 

Encoding requirements: At time £, for every m G [N], server m stores a symbol Xm which is 
a function of the stored symbol Xm ^ and the received packets W~m . We assume that Xm G \q] 
for every value of t, m. As usual, for a given encoding scheme, we measure its storage cost as 1q ° s ^ . 

Decoding requirements: We intend to design a failure tolerance of / servers. In our decoding 
requirement, a read client accesses any subset of N — f servers and requires to decode the latest 
common message version among the N — f servers, or the message corresponding to a later version. 
Our model is adversarial, that is, in the delay model, we want the read client decode the latest 
common verison for every possible packet arrival pattern at the servers that satisfies (22). Similarly, 
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in the erasure model, we intend the read client to decode the latest common version for every possible 
packet arrival pattern that satisfies (23). In our toy model we assume that the link between the 
servers and read clients are perfect, that is there is no erasure or delay for the packets between the 
servers and the read client. 

It is instructive to note that both the erasure model and delay model ensure that there is at time 
£, there is a latest common version for all the servers among the versions {£, £ — 1,..., £ — T+l}. 
Therefore, under our decoding requirements, a read client that aims to read from the storage system 
at time slot £ must decode a message in {Wt-T+ 1 , kkt-T+ 2 , • • • , Wt}. 

We will see that the multi-version coding problem can be used to reveal the fundamental storage 
cost performance of this setting. In particular, our achievability and converse results of an (n, c, v) 
multi-version code provides insights on the storage cost in our setting, where n = N,c = N — f 
and T — v. 

Claim 2 Consider a setting where very new message version at the write client takes values from the 
set [M]. If T\(N — f — 1), then a storage cost of 

log q = f + jv~— / -1 log M + o( - log M ) 

is achievable. 

Proof: The proof is the same for both the erasure model and the delay model. Observe that, 
in both models, at time £, there is a latest common version among all the servers in [t — T + 1,£]. 
As per Construction 2 for an (£V, T,N — f) multi-version code, each server stores bits of 

the latest version it has received. Along the same lines as the proof of Theorem 4, we infer that 
any read client that connects to N — f servers at time t gets at least codeword 

symbols corresponding to at least one version z/*, where u* E [t — T + 1,£] is the latest common 
version or a later version among the N — f servers. Therefore, version u* is decodable by the read 
client which reads at time £. There is a a storage overhead of o( log 2 M) bits since the servers need 
to store the time stamp 2 of the version along with the codeword symbol. ■ 

Claim 3 Consider a setting where very new message version at the write client takes values from the 
set [M]. The storage cost of any server storage strategy for the delay model satisfies 

log <7 ^ T — 1 log((r-l) T - 1 ( JV -^p 2 )) 

log M ~ N — f + T — 2 (N — f + T — 2) log M ' 

Claim 4 Consider a setting where very new message version at the write client takes values from the 
set [M]. The storage cost of any server storage strategy for the erasure model satisfies 

log q T _ logCr^-^- 1 )) 

log M ~ N — f + T — 1 (N — f + T — 1) log M * 


Claims 3 and 4 are essentially corollaries to Theorem 2. In particular, for both the delay model 
and the erasure model, we note that the server encoding functions necessarily implements a multi¬ 
version code. We provide brief sketches of their proofs here. 

2 In fact, a server that stores a codeword sybol corresponding to message Wt can store the time stamp as t mod 2 T. We 
omit mechanical details of the proof here. 
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Proof of Claim 3; Consider an arbitrary collection of subset Mi, M 2 , • • •, A n Q [£ — T + 2, £]. 
Assume that at time £ — T, all the packets from the write client to the server are delivered. At 
times £ — T + 1,..., £ — 1, no packet is delivered by the channel. At time £, server m receives packets 
Am U {£ — T+ 1}. Given the versions 1, 2,..., £ — T+l, the server encoding functions at time £ form 
a multi-version code over the versions in [£ — T + 2, £]. Since we started with an arbitrary collection 
of subsets Mi, A 2 , - • • , An, the worst storage cost over all collections of subsets is lower bounded by 
the cost described in Theorem 2 for v — T, c = N — f. This completes the proof. ■ 

Remark 5 The worst case states described the converse of Theorem 2, when applied to the delay model, 
may implicitly require the packets sent to be delivered out of order. This is because, when the converse 
of Theorem 2 is applied in our proof of Claim 3, we may require a server m to contain a version t\ 
but not contain a version £ 2 , where £2 < £ 1 ? £ 2 ? ^1 £ [t — T + 2, £]. Our converse for the delay model 
is therefore more relevant to applications where the versions may be sent in one order, and received in 
another. The assumption that the order of packets may change is the basis of certain transport protocols 
such as Stream Control Transmission Protocol (SCTP) [24]. 

Proof of Claim 4: Consider an arbitrary collection of subset Ai,A 2 , • • • , M n Q [£ — T + 1,£] 
such that, for every collection of c subsets in Mi, M 2 , - • • , M n , there is a common version. Assume 
that at time £0 in [£ — T +1, £], server m G [n\ receives the packet sent by the write client if and only 
if £0 G A m - Given the messages W^ t _ T ^ the server encoding strategy at times [£ — T + 1 : £] forms 
a multi-version code. Since we started with an arbitrary collection of subsets Mi,M 2 , • • • , M n , the 
worst storage cost over all collections of subsets is lower bounded by the cost described in Theorem 
2 for v — T, c = N — f. This completes the proof. ■ 

The parameter v is analogous to the parameter T in both the erasure and the delay models. The 
parameter T is, intuitively speaking, a measure of the degree of asynchrony in the system. Our 
toy models therefore establishes an explicit connection between the parameter v and the degree of 
asynchrony in the storage system. A multi-version code with a larger value for the parameter v can 
tolerate a greater degree of asynchrony, albeit at a larger storage cost. 

VIII. Concluding Remarks 

In this paper, we have proposed the multi-version coding problem, where the goal is to encode 
various versions in a distributed storage system so that the latest version is decodable. We have 
given a lower bound on the worst-case storage cost and provide a simple coding scheme that is 
essentially optimal for an infinite family of parameters. Our problem formulation and solution is 
a step towards the study of consistent key value stores from an information theoretic perspective. 
The multi-version coding problem affords a number of interesting generalizations which are relevant 
to practical consistent distributed storage systems. We discuss some of these generalizations next. 

• A useful direction of future work is to study the problem beyond a worst-case setting, for 
instance, through analysing a restricted set of states. For example, one can assume that the 
servers always get consecutive versions: S(i) = [x, y], for some 1 < x < y < v. One can similarly 
assume that due to network constraints, certain versions only are dispersed to a subset of the 
servers, namely, x £ S(i) for all i G /, where / C [n\ is some subset of server indices. More 
generally, our problem could be formulated in terms of storage cost per server per state, and 
the overall storage cost can be optimized based on the workload distributions of the servers. 

• Our problem formulation assumes that the number of versions z/, is known a priori. An 
interesting direction of future work is to manipulate our problem formulation and solutions to 
incorporate a setting where this parameter is not known. 

• Our problem formulation essentially views different versions as being independent. However, 
in several applications, it is conceivable that different versions are correlated. For dependent 
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versions, Remark 4 suggests that the converse of Theorem 2 would be applicable after appro¬ 
priate manipulations. Developing code constructions that exploit dependency in the versions 
is an interesting area of future work. The ideas of [14], [16] can be useful in this endeavor. 

• The framework of our toy model can be developed to study more realistic scenarios. The 
first step would be to incorporate asynchrony/erasures in the read client. The end goal of 
the framework would be to understand costs in realistic storage systems, or over models 
studied in distributed algorithms literature. The standard model in distributed algorithms 
can be viewed as the delay model of Section VII in the limiting case of asymptotically large T. 
Furthermore, in distributed computing theory, write and read clients, and servers are modeled 
as automota (more precisely, input-output automata [1], [25]), the goal is to design client 
and server protocols that ensure consistency. Developments of our toy model, and appropriate 
refinements to multi-version coding, can potentially provide information theoretic insights into 
the storage cost of such systems. 

• Minimizing communication costs and latency are important requirements of modern consistent 
storage services. Refinements of multi-version coding, and our toy model for channels to 
incorporate these requirements is an important direction of future work. In particular, the tools 
used in references [14], [15], [16], [17], when appropriately adopted to multi-version coding, may 
help reduce latency by reducing the amount of information transmitted to disperse and update 
information related to a new version. 
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